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CN Abstract 

To support developers in writing reliable and efficient concurrent programs, novel con- 
Cn| current programming abstractions have been proposed in recent years. Programming with 

^ such abstractions requires new analysis tools because the execution semantics often differs 

^ considerably from established models. We present a record-replay technique for programs 

written in SCOOP, an object-oriented programming model for concurrency. The resulting 
tool enables developers to reproduce the nondeterministic execution of a concurrent program, 
a necessary prerequisite for debugging and testing. 

O 1 Introduction 

Q 

^ Avoiding concurrency-specific errors such as data races and deadlocks is still the responsibility 

^ of developers in most languages that provide synchronization through concurrency libraries. 

To avoid the problems of the library approach, a number of languages have been proposed 
that fully integrate synchronization mechanisms. SCOOP (Simple Concurrent Object-Oriented 
/— V Programming) (6|[l0] , an object-oriented programming model for concurrency, is one of them. 



The main idea of SCOOP is to simplify the writing of correct concurrent programs, by 
allowing developers to use familiar concepts from object-oriented programming, but protecting 
them from common concurrency errors such as data races. Empirical evidence supports the 
claim that SCOOP indeed simplifies reasoning about concurrent programs as opposed to more 
I established models pj. 

The complex interactions between concurrent components make it difficult to analyze the 
^> behavior of typical concurrent programs. Effective use of a programming model therefore 

^ requires tools to help developers analyze and improve programs. Static analysis of models, 

H e.g., [2|[7|[TTJ[l5] , can establish some degree of functional correctness. However, they fail to ex- 

plain why a particular execution does not terminate. Once a problem has been identified, it may 
be difficult to reproduce it because the problem might manifest itself only under some particular 
inter leavings. Worse, the act of debugging itself might make it go away because of changes in 
the interleaving caused by the observation instructions. The term Heisenbug is sometimes used 
to denote this phenomenon. Addressing these issues requires adapting record-replay techniques 
to the context of concurrent, non-deterministic execution. Section [2] surveys existing tools that 
address this goal. They are not appropriate, however, for the semantics of SCOOP. 

We present a SCOOP adaptation of Choi and Srinivasan's [3| record-replay technique for 
Java threads. The resulting tool has been integrated into the EVE ^ development environ- 
ment, which we extended with support for SCOOP. We found that the SCOOP model provides 
abstractions that can be leveraged by the technique: SCOOP's synchronization mechanism pro- 
vides abstractions which are coarse-grained enough to limit state space explosion and thus keep 
execution records small. 
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This article is structured as follows. Section [2] provides an overview of related work. Section[3] 
gives an overview of the SCOOP model. Section |4] presents the adapted record- replay technique. 
Section [5] concludes with an outlook on future work. 



2 Related Work 

The main problem of debugging concurrent programs is to make concurrent executions repeat- 
able; a number of approaches to address this problem have emerged. The approach of Pan and 
Linton logs all data read from shared memory locations. To replay, it simulates the events 
from the log. While this approach has the advantage of allowing immediate reverse execution of 
a program (backstepping), its main drawback is the prohibitively large amount of data generated 
during execution, as acknowledged by [I2] . 

Most approaches have, as a consequence, focused on recording only the order of events, 
not the data; in a second step this information is used to replay the execution. The predom- 
inant approaches can be classified according to the type of information recorded: either only 
coarse-grained information such as object accesses and synchronization events [5, 14 or every 
shared- memory access [oj. LeBlanc and Mellor-Crummey fs] describe a method termed Instant 
Replay that records the order of accesses to shared objects during a monitoring phase by as- 
signing version numbers to objects and recording for each process which object versions have 
been accessed. Through this recording of object accesses, it can be ensured during replay that 
processes access objects of the same version numbers as during monitoring, thus reproducing the 
execution and the object values. Tai et al. |14| consider programs where all shared objects are 
protected by synchronization mechanisms. They record the order of the synchronization opera- 
tions. During replay, the execution can thus be recreated under the assumption that a program 
is free of data races. Netzer [9] proposes monitoring every shared- memory access so that data 
race-freedom no longer needs to be assumed. The technique is optimized with regard to the 
amount of information needed to reproduce an execution; it performs a transitive reduction of 
the dependencies between shared-memory accesses and only records the optimal ordering, thus 
significantly reducing the size of the trace log. A drawback of the approach consists, however, 
in the large amount of runtime overhead, as pointed out by |13| . 

Bacon and Goldstein jl] present a hardware-assisted scheme for deterministic replay. In 
contrast to the software-based methods, the scheme succeeds in avoiding the complications of 
the probe effect. Xu et al. flG] develop this approach further using a variant of the transitive 
reduction ^ to minimize log size. 

Instead of relying on a log of application events, as the previously discussed approaches 
usually do, Russinovich and Cogswell [13| recreate program executions by logging thread switches 
caused by the system scheduler. They modify the operating system to generate a log that can 
recreate the thread switches upon replay. Choi and Srinivasan |3j further improve this approach 
by logging logical thread schedules representing equivalence classes of physical thread schedules 
with respect to the ordering of shared- memory access events. Our approach for record-replay is 
based on logical thread schedules and adapts the idea in the context of SCOOP. 



3 Background 

This section gives an overview of SCOOP. The starting idea of SCOOP is that every object is 
associated for its lifetime with a processor, called its handler. A processor is an autonomous 
thread of control capable of executing actions on objects. An object's class describes the possible 
actions as features. A processor can be a CPU, but it can also be implemented in software, for 
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example as a process or as a thread; any mechanism that can execute instructions sequentially 
is suitable as a processor. 

A variable x belonging to a processor can point to an object with the same handler {non- 
separate object), or to an object on another processor {separate object). In the first case, a 
feature call x.f is non-separate: the handler of x executes the feature synchronously. In this 
context, X is called the target of the feature call. In the second case, the feature call is separate: 
the handler of x, i.e., the supplier, executes the call asynchronously on behalf of the requester, 
i.e., the client. The possibility of asynchronous calls is the main source of concurrent execution. 
The asynchronous nature of separate feature calls implies a distinction between a feature call 
and a feature application: the client logs the call with the supplier (feature call) and moves on; 
only at some later time will the supplier actually execute the body (feature application). 

The producer-consumer problem serves as a simple illustration of these ideas. A root class 
defines the entities producer, consumer, and buffer. Assume that each object is handled by its 
own processor. One can then simplify the discussion using a single name to refer both to the 
object and its handler. For example, one can use "producer" to refer both to the producer 
object and its handler. 

producer, separate PRODUCER 
consumer, separate CONSUMER 
buffer, separate BUFFER [INTECER] 

The keyword separate specifies that the referenced objects may be handled by a processor 
different from the current one. A creation instruction on a separate entity such as producer will 
create an object on another processor; by default the instruction also creates that processor. 

Both the producer and the consumer access an unbounded buffer in feature calls such as 
buffer. put (n) and buffer. item. To ensure exclusive access, the consumer must lock the buffer 
before accessing it. Such locking requirements of a feature must be expressed in the formal 
argument list: any target of separate type within the feature must occur as a formal argument; 
this ensures that the arguments' handlers are locked for the duration of the feature execution, 
thus preventing data races. Such targets are called controlled. For instance, in consume, buffer 
is a formal argument; the consumer has exclusive access to the buffer while executing consume. 

Condition synchronization relies on preconditions (after the require keyword) to express 
wait conditions. Any precondition of the form x.somc-condition makes the execution of the 
feature wait until the condition is true. For example, the precondition of consume delays the 
execution until the buffer is not empty. As the buffer is unbounded, the corresponding producer 
feature does not need a wait condition. 

consume {buffer separate BUFFER [INTEGER]) 

Consume an item from the buffer. 

require not {buffer. count = 0) 
local 

consumed-item: INTEGER 
do 

consumed-item := buffer. item 
end 

The runtime system ensures that the result of the call buffer. item is properly assigned to the 
entity consumed-item using a mechanism called wait by necessity: while the consumer usually 
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does not have to wait for an asynchronous call to finish, it will do so if it needs the result. 

The SCOOP concepts require runtime support. The following description is abstract; actual 
implementations may differ. Each processor maintains a request queue of requests resulting 
from feature calls on other processors. A non-separate feature call can be processed right away 
without going through the request queue; the processor creates a non-separate feature request 
for itself and processes it right away using its call stack. The rest of this discussion applies to 
separate feature calls, such as the call on the buffer performed on behalf of the consumer. When 
the client executes such a feature call, it enqueues a separate feature request to the request queue 
of the supplier's handler. The supplier will process the feature requests in the order of queuing. 

Whenever a processor is ready to let go of the obtained locks, i.e., at the end of its current 
feature application, it issues an unlock request to each locked processor. Each locked processor 
will unlock itself as soon as it processed all previous feature requests. In the example, the 
producer issues an unlock request to the buffer after it issued a feature request for put. 

The runtime system includes a scheduler, which serves as an arbiter between processors. 
When a processor is ready to process a feature request in its request queue, it will only be 
able to proceed after the request is satisfiable. In a synchronization step, the processor tries 
to obtain the locks on the arguments' handlers in a way that the precondition holds. For this 
purpose, the processor sends a locking request to the scheduler, which stores the request in a 
queue and schedules satisfiable requests for application. Once the scheduler satisfies the request, 
the processor starts an execution step. 



4 Record-replay 

This section presents a record-replay technique for SCOOP programs. The technique is an 
adaptation of Choi and Srinivasan's [s] approach, developed for Java multithreading. Their 



notion of logical thread schedules helps keep the size of the log file small. Section 4.1 presents 
the SCOOP-adaptation of logical thread schedules, called logical processor schedules. Section 4.2 
and Section |4.3| show how the SCOOP runtime records and replays them. 



4.1 Logical Processor Schedules 

As demonstrated in Section [2| a number of effective approaches to the problem of deterministic 
replay of multithreaded programs exist. For executions on uniprocessor systems, the approach 
of Russinovich and Cogswell ^13j has been shown to outperform techniques that try to record 
how threads interact. They propose to log thread scheduler information and to enforce the 
same schedule when a run is replayed. This approach also works well in our case. To minimize 
the overhead from capturing physical processor schedules - the equivalent of physical thread 
schedules in the case of SCOOP - we adapt the notion of logical thread schedules from |3]. This 
section describes this adaptation. 

Consider a share market application with investors, markets, issuers, and shares. The mar- 
kets and the investors are handled by different processors. Listing [T] shows the class for the 
investors. Each investor has a feature to buy a share. To execute it, the investor must wait for 
the lock on the market and for the precondition to be satisfied. 



Listing 1: Investor class 

class INVESTOR feature 
id: INTEGER 
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buy {market: separate MARKET; issuer Jd: INTEGER) 
— Buy a share of the issuer on the market, 
require 

market. can_buy {id, issuer_id) 
do 

market, buy {id, issuer Ad) 
end 

end 



The following feature initiates a transaction that involves two investors and one market with 
shares from two issuers: 



do -transaction {first-investor, second_investor: separate INVESTOR; base-is suer_id: 
INTEGER) 

Make the two investors buy two shares from two consecutive issuers on the market. 

local 

nextAssuerAd: INTEGER 
do 

firstAnvestor.buy {market, baseAssuerAd) 
nextAssuerAd := baseAssuerAd + 1 
secondAnvestor.buy {market, nextAssuerAd) 
end 



Figure [T] depicts a number of possible physical processor schedules for this example. The dif- 
ference between schedules a and b is that in a, the application sets the local variable nextAssuerAd 
after the first investor buys its share from the market, whereas in b the variable is set before this 
event. In schedule c, the second investor buys its share before the first investor does. Schedules 
a and b give rise to the same behavior on the market, whereas schedule c causes the transaction 
to be reversed: the second investors gets to buy its share first. The reason is that changes 
in the update of local variables do not infiuence shared objects, whereas the order of critical 
events does. In SCOOP, the only critical events occur in the synchronization step, i.e., when the 
scheduler approves a locking request. We regard two physical processor schedules as equivalent 
if they have the same order of locking requests. A logical processor schedule denotes an equiva- 
lence class of physical processor schedules, i.e., physical processor schedules where the scheduler 
approves the locking requests in the same order. Section 4.2 describes the implementation of 
logical processor schedules. 



4.2 Recording Logical Processor Schedules 

A logical processor schedule consists of one interval list per processor. An interval list is a 
sequence of intervals that keeps track of a processor's approved locking requests. The scheduler 
uses a global counter with value counter g to number the approved locking requests. An interval 
[I, u] is defined by a lower global counter value I and an upper global counter value u, such that 
the locking requests with numbers in \l,v\ belong to the same processor and no locking request 
with a number in an adjacent interval belongs to the same processor. 

Once the recorder is activated, the scheduler executes Algorithm [TJ To detect when a new 
interval should start, the scheduler maintains for each processor a local counter with value 
counter I and a local counter base with value basei. The local counter base of a processor p 
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a: 



application processor 



first investor processor 


second investor processor 







market processor 



first_investor.buy (market, base_issuer_id) 



market-buy (id, issuerjd) 



next_issuer_id := base_issuer_id + 1 j 

second_investor.buy (market, next_issuer_id) 



market.buy (id, issuerjd) 



b: 



application processor 



first investor processor 


second investor processor 







market processor 



first_investor.buy (market, base_issuer_id) 



next_issuer_id := base_issuer_id + 1 



marketbuy (id, issuerjd) 



second jnvestor.buy (market, next jssuer jd) 



market.buy (id, issuerjd) 



application processor 



first investor processor 


second investor processor 





first jnvestor.buy (market, base jssuer jd) i 
^ 



next jssuer jd := base jssuer jd + 1 j 

I 

second jnvestor.buy (market, next jssuer jd) 



market processor 



market.buy (id, issuerjd) 



D 



marketbuy (id, issuerjd) 



Figure 1: Three possible physical processor schedules for the market example 



stores the value of the global counter at the point where the scheduler started recording an 
interval for p. The local counter counts p's locking requests that got approved from the moment 
where the scheduler started recording the interval for p. Processor p's current interval is then 
given as [basei[p] + 1, basei[p] + counteri[p]]. 

Whenever the scheduler approves a locking request r of a processor p, it goes through the 
following checks. If p's local counter is undefined, then p does not have an interval yet, and thus 
r belongs to a new interval for p. Hence, the scheduler starts recording a new interval for p. 

If p's local counter is defined and counter g = basei[p] + counteri[p], then the scheduler is 
currently recording an interval for p, and r belongs to this interval. This can be seen as follows. 
If the scheduler would have approved locking requests of any other processor q since it started 
recording p's interval, then the scheduler would have incremented the global counter, but not p's 
local counter. Thus the equation would not hold. Hence, the scheduler did not approve locking 
request of other processors and thus r belongs to p's current interval. 

If p's local counter is defined and counter global 7^ baseiocal + counter local , then the scheduler 
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Algorithm 1: Record 

upon event (Initialize) do // The program starts. 
counterg := 0; // The global counter, 
forall the p G processors do 

counteri[p] := undef; // The local counters. 

baseilp] := undef; // The local counter bases. 

intervals\p\ := (); // The interval lists, 
end 

upon event (Approved \p) do // The scheduler approved p's request. 

if counterilp] — undef then 

baseilp] '■— counterg] 

counterilp] •= 1; 

counterg := counterg + 1; 
else if counterilp] / undef A counterg = baseilp] + counterilp] then 

counterilp] '■— counterilp] + 

counterg := counterg + 1; 
else if counterilp] undef A counterg ^ baseilp] + counterilp] then 

intervalslp] := intervalslp] • Ibaseilp] + 1, baseilp] + counterilp]]; 

baseilp] '■— counterg] 

counterilp] •= 1; 

counterg := counterg + 1; 
end 

upon event (Terminate) do // The program terminates . 
forall the p G processors do 
if counterilp] 7^ undef then 

intervalslp] '■— intervalslp] • [^ase^jj)] + 1, baseilp] + counterilp]]; 
write {p, intervalslp]); 
end 
end 



is currently recording an interval for p, and r belongs to a new interval. This can be seen as 
follows. If the scheduler would not have approved locking requests of any other processor q, 
since it started recording p's current interval, then only p would have incremented the global 
counter and its local counter. Thus the equation would hold. Hence, the scheduler must have 
approved one or more locking requests of other processors and thus r belongs a new interval on 
p. In this case, the scheduler finishes p's current interval and adds r to a new interval. 

At the end of the program execution, the scheduler checks for each processor whether there 
is any pending interval, in which case it adds the interval to the respective interval list. 

Consider again the market example. Assume the investor class has an additional feature 
buy.alternative, which allows an investor to buy a share if possible; if it is not possible, a backup 
share is bought. For this reason, each investor has a backup market and an identifier of a backup 
issuer. 
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buy-alternative (market: separate MARKET; issuer-id: INTEGER) 

Try to buy a share of the issuer on the market. 

If this fails, buy some backup share on the backup market. 

do 

if market, can-buy {id, issuer Ad) then 

market.buy {id, issuer Ad) 
else 

buy {backup _market, backup_issuer_id) 
end 
end 



Consider the setup in Figure [2] Assume that a new transaction asks each investor to buy at 



issuer_id : INTEGER = software 
market : separate MARKET 
backup_issuer_id : INTEGER = construction 
backup_market : separate MARKET 



first investor : INVESTOR 



second investor : INVESTOR 



issuer_id : INTEGER = software 
market : separate MARKET 
backup_issuer_id : INTEGER = construction 
backup_market : separate MARKET 



-market 



-backup_market 



-backup_market 



-market 



Zurich market : MARKET 



^ available_shares : BAG[INTEGER] = (software, construction) 



New York market : MARKET 



^ available_shares : BAG[INTEGER] = {software, construction} 



Figure 2: Object structure for the market example 

least one share of the software company by calling buy and then buy_alternative. The schedule 
in Figure [3] leads to a deadlock because the two investors hold a lock on one market while trying 
to lock the other market; however, not all possible schedules exhibit the problem. The proposed 
technique produces the following logical processor schedule: application: [1,1], first investor: 
[2,2] • [6,6], second investor: [4,4] • [8,8], Zurich market: [3,3] • [7,7], and New York market: 
[5,5] • [9,9]. Section 4.3 shows how to replay this logical processor schedule to reproduce the 
deadlock. 



4.3 Replaying Logical Processor Schedules 

To replay a logical processor schedule, the scheduler once again uses a global counter counter g] 
this time the global counter represents the number of the locking request that the scheduler 
wants to approve next. To replay, the scheduler executes Algorithm [2} 

To begin, the scheduler gets ready to approve the first locking request. Whenever the 
scheduler is about to approve a locking request / of a processor p, the scheduler first checks 
whether / is next. To do so, the scheduler consults p's interval list and checks whether it 
contains an interval with counter g. If the interval list contains such an interval, then the 
scheduler approves the locking request and gets ready to approve the next locking request, i.e., 
it increments the global counter. If the interval list does not contain such an interval then the 
scheduler tries another locking request. 



To replay the logical processor schedule from Section 4.2 the scheduler initializes the global 
counter to 1. As soon as the application sends a locking request, the scheduler approves and 
increments the global counter to 2. The first two calls on the investors cause them to each send 
a locking request. The scheduler lets the first investor proceed and sets the global counter to 
3. The second investor must wait because its interval list does not contain the current global 
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application processor 


first investor processor 


second investor processor 


Zurich market processor 


New York market processor 


scheduler 















request lock on investor processors 



first_mvestor.buy (zurich_market, issuer_id) 



granted j 
request lock on Zurich market processor 



granted 



market.buy (id, issuer_id) 



request no locks 



unlock 



granted] 



33 



second_investor.buy (new_york_market, issuer_id) 



request lock on New York market processor 



I granted 

I 

market.buy (id, issuer_id) 



I request no locks 
J]^ granted 5 



first_investor.buy_altemative (zuricii„market, issuer_id) 



request lock on Ziuich market processor 









granted ] 






market.can buy (id, issuer id) 


1 


request no locks ] 






\ ► 

1 
1 

fidse 

k 




granted] ""I 1 

^ 1 g 



second_investor.buy_altemative (new_york_market, issuer_id) 

■ >^ 



request lock on New York market processor 



j granted 
market.can_buy (id, issuer_id) 



false 



imlock 





3« 

granted ^ ' | Q 



request no locks 



buy (backup_market, backup_issuer_id) 

requesti lock on Ziuich market processor 



buy (backup_market, backup_issuer_id) j 

request lock on New York market processor 



Figure 3: A physical processor schedule of the market example in detail. The numbers next to 
the scheduler lifeline indicate the approved locking requests. 



counter value. The first investor calls the Zurich market, whose locking request the scheduler 
approves right away. Now the global counter is at 4, and the scheduler lets the second investor 
and the New York market proceed. As a consequence, the global counter reaches 6. In the 
meantime, the application performed two more calls to the investors. In sequence, the scheduler 
approves the locking requests of the first investor, the Zurich market, the second investor, and 
the New York market. The deadlock is guaranteed. 
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Algorithm 2: Replay 



upon event (Initialize) do // The program starts. 
counterg := 1; // The global counter, 
forall the p £ processors do 

I intervals\p\ := read (p); II The interval lists, 
end 

upon event (Check \ p) do // The scheduler checks on p's request. 

if 3[Z, u] E intervals\p]: I < counterg < u then 

counterg := counterg + 1; 

trigger (Ok ); 1 1 The request is next, 
else 

I trigger (NotOk )\ 1 1 The request is not next, 
end 



5 Conclusion 

While the SCOOP model protects developers from introducing data races, its run-time system 
is complex; this makes errors such as deadlock hard to analyze without the ability to reproduce 
them. We introduced a record-replay technique to record and reproduce the execution of SCOOP 
programs. The technique uses the idea of logical thread schedules |3j to abstract from non-critical 
events. The simplicity of the SCOOP model helped to apply this technique: the approvals of 
locking request are the only relevant critical events. 

The ability to replay executions using logical processor schedules is an important component 
to test SCOOP programs. In future work, schedules may be generated in order to drive programs 
systematically through different orders. 
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